Skip to content

✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm #5553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

AmitSahastra
Copy link
Contributor

@AmitSahastra AmitSahastra commented Jun 17, 2025

Description

This PR adds support for using launch template EKSConfig bootstrapping for Amazon Linux 2023 nodes. The EKSConfig controller now able to create bootstrap datasecrete with nodeConfig that will enable using AL2023 images with LaunchTemplates.

⚠️ Prior to this change, CAPA always generated cloud-init-style userData, which is incompatible with AL2023. This patch enables EKS-managed bootstrap using nodeadm via MIME-compliant NodeConfig.

Changes

  • Modified EKSConfig controller to fetch AMI ID from AWSManagedMachinePool's launch template
  • Added proper condition handling for control plane readiness
  • Ensured proper reconciliation when dependencies (AWSManagedControlPlane and AWSManagedMachinePool) become ready

Testing

  • Verified EKSConfig correctly generates a bootstrap data secret using a NodeConfig MIME payload when nodeType: al2023 is specified
  • Confirmed that node labels are properly set with the AMI ID
  • Tested reconciliation behaviour when dependencies are not ready
  • Validated that the controller proceeds with data secret creation once dependencies are ready
  • Example node output:
k get nodes -o wide
NAME                                          STATUS   ROLES    AGE   VERSION                INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION                    CONTAINER-RUNTIME
ip-10-0-183-158.ap-south-1.compute.internal   Ready    <none>   17h   v1.30.11-eks-473151a   10.0.183.158   <none>        Amazon Linux 2023.7.20250527   6.1.134-152.225.amzn2023.x86_64   containerd://1.7.27
ip-10-0-65-54.ap-south-1.compute.internal     Ready    <none>   17h   v1.30.11-eks-473151a   10.0.65.54     <none>        Amazon Linux 2023.7.20250527   6.1.134-152.225.amzn2023.x86_64   containerd://1.7.27
  • EKSConfig with nodeType:
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: EKSConfig
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"bootstrap.cluster.x-k8s.io/v1beta2","kind":"EKSConfig","metadata":{"annotations":{},"name":"am-clusterctl-eks-2-pool-0-bootstrap","namespace":"default"},"spec":{"nodeType":"al2023"}}
  creationTimestamp: "2025-06-16T12:26:26Z"
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: am-clusterctl-eks-2
  name: am-clusterctl-eks-2-pool-0-bootstrap
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachinePool
    name: am-clusterctl-eks-2-pool-0
    uid: c4721a8a-a456-434c-8e3f-a5722a5ffb13
  resourceVersion: "200264"
  uid: d224aca6-da5e-4788-8fa6-c727ace79c25
spec:
  nodeType: al2023
status:
  conditions:
  - lastTransitionTime: "2025-06-17T03:37:53Z"
    status: "True"
    type: Ready

  • Generated bootstrap data:
value: |-
  MIME-Version: 1.0
  Content-Type: multipart/mixed; boundary="//"

  --//
  Content-Type: application/node.eks.aws

  ---
  apiVersion: node.eks.aws/v1alpha1
  kind: NodeConfig
  spec:
    cluster:
      apiServerEndpoint: https://xxxxxxxx.xxx.ap-south-1.eks.amazonaws.com
      certificateAuthority: xxxxx
      cidr: 10.96.0.0/12
      name: default_am-clusterctl-eks-2-control-plane-2
    kubelet:
      config:
        maxPods: 110
        clusterDNS:
        - 10.96.0.10
      flags:
      - "--node-labels=eks.amazonaws.com/nodegroup-image=ami-0930ab0e58973e126,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/nodegroup=am-clusterctl-eks-2-pool-0-bootstrap"

  --//--

  • AWSMMP
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructure.cluster.x-k8s.io/v1beta2","kind":"AWSManagedMachinePool","metadata":{"annotations":{},"name":"am-clusterctl-eks-2-pool-0","namespace":"default"},"spec":{"amiType":"CUSTOM","awsLaunchTemplate":{"ami":{"id":"ami-0930ab0e58973e126"},"instanceType":"t3.large","sshKeyName":"spectro2024"},"scaling":{"maxSize":5,"minSize":1}}}
  creationTimestamp: "2025-06-16T12:26:26Z"
  finalizers:
  - awsmanagedmachinepools.infrastructure.cluster.x-k8s.io
  generation: 4
  labels:
    cluster.x-k8s.io/cluster-name: am-clusterctl-eks-2
  name: am-clusterctl-eks-2-pool-0
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachinePool
    name: am-clusterctl-eks-2-pool-0
    uid: c4721a8a-a456-434c-8e3f-a5722a5ffb13
  resourceVersion: "200257"
  uid: 20954503-d2eb-430f-933f-977a70f9cba3
spec:
  amiType: CUSTOM
  awsLaunchTemplate:
    ami:
      id: ami-0930ab0e58973e126
    instanceType: t3.large
    marketType: OnDemand
    sshKeyName: sshkey
  capacityType: onDemand
  eksNodegroupName: default_am-clusterctl-eks-2-pool-0
  providerIDList:
  - aws:///xxxx/i-07142dfb97c7exxxx
  - aws:///xxxx/i-0ef83e9b00a9fxxxx
  roleName: eks-nodegroup.cluster-api-provider-aws.sigs.k8s.io
  scaling:
    maxSize: 5
    minSize: 1
  updateConfig:
    maxUnavailable: 1
status:
  conditions:
  - lastTransitionTime: "2025-06-17T03:37:52Z"
    status: "True"
    type: Ready

Impact

These changes allow users to use launch templates with AL2023 images. Also to specify custom AMIs through launch templates while maintaining compatibility with CAPA's auto AMI lookup mechanism.

Related Issues

Fixes #5546

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

EKSConfig now supports generating MIME-formatted bootstrap data for AL2023 nodes using nodeadm and Launch Templates via AWSManagedMachinePool.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added needs-priority size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 17, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @AmitSahastra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AmitSahastra AmitSahastra changed the title Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm ✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm Jun 17, 2025
@AmitSahastra AmitSahastra changed the title ✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm # ✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm Jun 17, 2025
@AmitSahastra AmitSahastra changed the title # ✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm ✨ Add support for EKSConfig LaunchTemplate bootstrapping for AL2023 using nodeadm Jun 17, 2025
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 17, 2025
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 20, 2025
@fiunchinho
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 24, 2025
@AmitSahastra
Copy link
Contributor Author

/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-test

@AmitSahastra AmitSahastra force-pushed the al2023-launch-template branch from 119ecfe to 7acf4c3 Compare July 10, 2025 12:35
@k0da
Copy link

k0da commented Jul 14, 2025

LGTM. Running this in my dev environment. So far no further issues.

@rudimk
Copy link

rudimk commented Jul 23, 2025

@AmitSahastra bit curious about the use of EKSConfig here. Right now I've got about a hundred plus clusters on CAPA that are using AWSManagedMachinePools but I'm not using an EKSConfig with them. I guess my question is - will I have to plug in an EKSConfig for every cluster, in order to make the switch to AL2023? Also kind of curious as to how that might impact existing managed nodegroups running off AL2.

Really appreciate your work on this, looking forward to having it out soon!

@AmitSahastra
Copy link
Contributor Author

@AmitSahastra bit curious about the use of EKSConfig here. Right now I've got about a hundred plus clusters on CAPA that are using AWSManagedMachinePools but I'm not using an EKSConfig with them. I guess my question is - will I have to plug in an EKSConfig for every cluster, in order to make the switch to AL2023? Also kind of curious as to how that might impact existing managed nodegroups running off AL2.

Really appreciate your work on this, looking forward to having it out soon!

Had you enable launch template in AWSMMP it should create EKS config.

@rudimk
Copy link

rudimk commented Jul 23, 2025

Aye we are using launch templates in our managed machine pools using the AWSLaunchTemplate parameter. Pretty certain no corresponding EKSConfig objects are being created - because I don't see any on our management cluster - hence my question around whether one needs to create EKSConfig objects for each cluster.

@AmitSahastra
Copy link
Contributor Author

Yes, for AL2023 support via CAPA, I added logic that uses nodeType: al2023 inside EKSConfig to generate the correct nodeadm-style userdata. So if you’re planning to switch to AL2023 and rely on CAPA for bootstrap, then yes, you’ll need to start creating EKSConfig objects with that field set.

As for existing AL2-based nodegroups, there’s no impact — they can stay as-is with your current launch templates and don’t require any changes unless you explicitly migrate them to AL2023.

I hope that answers your question.

@rudimk
Copy link

rudimk commented Jul 26, 2025

That's baller - thanks a ton @AmitSahastra, appreciate you!

@mloiseleur
Copy link
Contributor

mloiseleur commented Aug 4, 2025

I have built and tested this PR on a sandbox environment for our use case.
Maybe it works for AWSManagedMachinePool, but it does not support AWSManagedControlPlane with MachineDeployment and EKSConfigTemplate.

It still creates instances with previous format and so, fails to connect.

Example YAML
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
  name: test-3723977939
  namespace: flux-system
spec:
  template:
    spec:
      ami:
        eksLookupType: AmazonLinux2023
      instanceMetadataOptions:
        httpTokens: required
        httpPutResponseHopLimit: 2
      iamInstanceProfile: test-capa-nodes
      instanceType: t4g.medium
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: EKSConfigTemplate
metadata:
  name: test-al2023
  namespace: flux-system
spec:
  template:
    spec:
      nodeType: al2023
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: t00-use1-eks-test-us-east-1a-md0
  namespace: flux-system
spec:
  clusterName: test
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
          kind: EKSConfigTemplate
          name: test-al2023
      clusterName: test
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachineTemplate
        name: test-3723977939
      version: v1.32.3
CAPA Logs
I0804 13:55:29.740725       1 eksconfig_controller.go:265] "Generating userdata" 
I0804 13:55:29.740740       1 eksconfig_controller.go:314] "Processing AL2023 node type" 
I0804 13:55:29.741301       1 eksconfig_controller.go:366] "Generating AL2023 userdata" 
I0804 13:55:29.748003       1 eksconfig_controller.go:443] "created bootstrap data secret for EKSConfig" 
I0804 13:55:29.766772       1 eksconfig_controller.go:197] "joinWorker called" [repeated multiple times, without success]

The instance is launched successfully, but with previous metadata format and so, it's not able to connect:

I0804 13:55:30.090681       1 awsmachine_controller.go:732] "Creating EC2 instance"
I0804 13:55:30.306418       1 instances.go:135] "Obtained a list of supported architectures for instance type"  [...] supported architectures=["arm64"]
I0804 13:55:30.307461       1 instances.go:135] "Chosen architecture" [...]  instance type="t4g.medium" supported architectures=["arm64"] architecture="arm64"
I0804 13:55:30.353337       1 ami.go:374] "found AMI" [...] id="ami-0cbd69473baf7c079" version="1.32"
I0804 13:55:32.550263       1 awsmachine_controller.go:578] "EC2 instance state changed" state="pending" instance-id="i-xxx
I0804 13:56:03.485209       1 awsmachine_controller.go:578] "EC2 instance state changed" state="running" instance-id="i-xxx"

After further investigation, I can be more precise. The secret is created as expected, with the expected content.
The vm is deployed with userdata created by pkg/cloud/services/secretsmanager/secret.go (template is in pkg/cloud/services/secretsmanager/secret_fetch_script.go). So far, my guess is that it's supposed to get the secret from SecretManager, and that's where it's failing.

EDIT: It works when disabling Secrets Manager on user metadata in AWSMachineTemplate:

@@ -11,6 +11,8 @@
       {{- else }}
         eksLookupType: AmazonLinux2023
       {{- end }}
+      cloudInit:
+        insecureSkipSecretsManager: true
       instanceMetadataOptions:
         httpTokens: required
         httpPutResponseHopLimit: 2

@mloiseleur
Copy link
Contributor

@AmitSahastra I've done a review with a commit on my fork https://github.com/mloiseleur/cluster-api-provider-aws/tree/al2023-launch-template

What I've done:

  1. Fix maxPods (it was setting to 58 when setting the boolean in the CRD to true)
  2. Make clusterDNS optional (following this comment, confirmed with my test that it works as expected)
  3. Move node-labels from gotemplate to go func, for readability
  4. Add usage documentation in the book
  5. Tested everything on a sandbox env

May I invite you to add it to your PR ? You can apply this file or cherrypick from my branch.

1. Fix maxPods
2. Make clusterDNS optional
3. Move node-labels from gotemplate to go func, for readability
4. Add usage documentation in the book
@AmitSahastra
Copy link
Contributor Author

Thanks @mloiseleur for the detailed review and improvements! I’ve cherry-picked your changes on maxPods, clusterDNS, and the node-labels refactor, and added them to the PR. Also appreciated the test validation and doc additions. 🙌

@k8s-ci-robot
Copy link
Contributor

@AmitSahastra: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-test 155e074 link true /test pull-cluster-api-provider-aws-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

expectErr: false,
verifyOutput: func(output string) bool {
return strings.Contains(output, "cidr: 192.168.0.0/16") &&
strings.Contains(output, "maxPods: 58") &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
strings.Contains(output, "maxPods: 58") &&
strings.Contains(output, "maxPods: 110") &&

It should be 110 when useMaxPods is set.
That should fix the CI.

@@ -24,6 +24,9 @@ import (

// EKSConfigSpec defines the desired state of Amazon EKS Bootstrap Configuration.
type EKSConfigSpec struct {
// NodeType specifies the type of node (e.g., "al2023")
// +optional
NodeType string `json:"nodeType,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to derive this from the AMI being used rather than asking the user to specify in the API?

Comment on lines 25 to +29
// EKSConfigSpec defines the desired state of Amazon EKS Bootstrap Configuration.
type EKSConfigSpec struct {
// NodeType specifies the type of node (e.g., "al2023")
// +optional
NodeType string `json:"nodeType,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only required when using AL2023 so perhaps let's add an enum for this and only accept al2023? Something like this:

Suggested change
// EKSConfigSpec defines the desired state of Amazon EKS Bootstrap Configuration.
type EKSConfigSpec struct {
// NodeType specifies the type of node (e.g., "al2023")
// +optional
NodeType string `json:"nodeType,omitempty"`
// +kubebuilder:validation:Enum=al2023
type NodeType string
const (
NodeTypeAL2023 = "al2023
)
// EKSConfigSpec defines the desired state of Amazon EKS Bootstrap Configuration.
type EKSConfigSpec struct {
// NodeType specifies the type of node (e.g., "al2023")
// +optional
NodeType NodeType `json:"nodeType,omitempty"`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants